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Abstract 

In many applications one may acquire a composition of several signals that may be corrupted by 
noise, and it is a challenging problem to reliably separate the components from one another without 
sacrificing significant details. Adding to the challenge, in a compressive sensing framework, one is given 
only an undersampled set of linear projections of the composite signal. In this paper, we propose using 
the Dantzig selector model incorporating an overcomplete dictionary to separate a noisy undersampled 
collection of composite signals, and present an algorithm to efficiently solve the model. 

The Dantzig selector is a statistical approach to finding a solution to a noisy linear regression problem 
by minimizing the norm of candidate coefficient vectors while constraining the scope of the residuals. 
If the underlying coefficient vector is sparse, then the Dantzig selector performs well in the recovery and 
separation of the unknown composite signal. In the following, we propose a proximity operator based 
algorithm to recover and separate unknown noisy undersampled composite signals through the Dantzig 
selector. We present numerical simulations comparing the proposed algorithm with the competing Al¬ 
ternating Direction Method, and the proposed algorithm is found to be faster, while producing similar 
quality results. Additionally, we demonstrate the utility of the proposed algorithm in several experiments 
by applying it in various domain applications including the recovery of complex-valued coefficient vectors, 
the removal of impulse noise from smooth signals, and the separation and classification of a composition 
of handwritten digits. 
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1 Introduction 


This paper considers the problem of separating a composite signal through the recovery of an underlying 
sparse coefficient vector by using the Dantzig selector given only an incomplete set of noisy linear random 
projections. That is, we discuss the estimation of a coefficient vector c £ C q given the vector 


y = X/3 + z , (1) 

where X £ R nxp is a sensing matrix with n < p, z is a collection of i.i.d. ~ N(0,a 2 ) random variables, and 
the unknown signal f) £ admits the sparse representation f3 = Be for a known overcomplete dictionary 
B £ C pxq . The individual signals composed to form /? can then be recovered from c and B. Since Equation (JT]) 
is underdetermined yet consistent, it presents infinitely many candidate signals /? and coefficient vectors c. 

The Dantzig selector was introduced in [10] as a method for estimating a sparse parameter /? £ R p 
satisfying ©■ Discussions on the Dantzig selector, including comparisons to the least absolute shrinkage 
and selection operator (LASSO), can be found in @1 [6} [10] (TTJ [13Qj] I2H [57 . Both the Dantzig selector and 
LASSO aim for sparse solutions, but whereas LASSO tries to match the image of candidate vectors close to 
the observations, the Dantzig selector aims to bound the predictor of the residuals. When tuning parameters 
in LASSO and the Dantzig selector model are set properly, the LASSO estimate is always a feasible solution 
to the Dantzig selector minimization problem, although it may not be an optimal solution. Furthermore, 
when the corresponding solutions are not identical, the Dantzig selector solution is sparser than the LASSO 
solution in terms of the t\ norm [20] , Recently, the Dantzig selector model has been applied for gene selection 
in cancer classification [5i?j . 

Classical compressive sensing theory guarantees the recovery of a sparse signal given only a very small 
number of linear projections under certain conditions 01ii. However, very seldomly is a naturally 
encountered signal perfectly sparse relative to a single basis. Therefore, a number of works have considered 
the recoverability of signals that are sparse relative to an overcomplete dictionary that is formed by the 
concatenation of several bases or Parseval frames In this work, we propose and analyze 

a Dantzig selector model inspired by the above applications of overcomplete dictionaries in compressive 
sensing, and develop an algorithm for finding solutions to this model. 

The following notation will be used. The absolute value of a scalar a is denoted by |a|, and the number 
of elements in a set T is denoted by |T|. The smallest integer larger than the real number a is denoted by 
[a]. The z th element of a vector x is denoted by x(i), and the i th column of a matrix A is denoted by At. 
The support of a vector x is given by supp(:r) = {* : x{i) ^ 0}. The and ti vector norms, denoted by 


2 


|| • ||i, and || • ||2 respectively, are defined by 


IMIi = I^WI > IMh = (M *)! 2 


for any vector x G C". For matrices A, B with the same number of rows, 


A 


B 


is the horizontal concate¬ 


nation of A and B. Similarly, 


A 

B 


is the vertical concatenation of A and B , provided each has the same 


number of columns. The conjugate transpose of a matrix A is denoted by A T . 

The rest of the paper is organized as follows. In Section 2, the Dantzig selector model incorporating 
overcomplete dictionaries is introduced. In Section 3, we present an algorithm used to find solutions to the 
proposed model. Section 4 presents several numerical experiments demonstrating the appropriateness of the 
model and the accuracy of the results produced by the presented algorithm. In simulations using real-valued 
matrices in the overcomplete dictionary, we compare the efficiency and accuracy of the presented method 
with the competing Alternating Direction Method. Additionally, we demonstrate the utility of the proposed 
algorithm in several experiments by applying it in various domain applications including the recovery of 
complex-valued coefficient vectors, the removal of impulse noise from smooth signals, and the separation and 
classification of a composition of handwritten digits. We close the paper with some remarks and possible 
future directions. 


2 The Dantzig selector model incorporating overcomplete dictio¬ 


naries 


In this section, we present a Dantzig selector model incorporating overcomplete dictionaries that can be used 
to recover an unknown signal and reliably separate overlapping signals. 

Suppose the unknown composite signal (3 is measured via y = X/3 + z, where X is an n xp sensing matrix 
and z models sensor noise, and suppose an overcomplete dictionary B is known such that /3 = Be for some 
sparse c. Although /3 and c are not known, it is reasonable in many applications to know or suspect the 
correct dictionary components. For example, if the signals of interest appear to be sinusoids with occasional 
spikes as in Figure [3] one should use a dictionary that is a concatenation of a discrete Fourier transform 
component and a standard Euclidean basis component. In the following, let q = 2p and assume the p x q 


dictionary B is formed by a horizontal concatenation of a pair of orthonormal bases, B = 


$ ip 


, and 


the components of j3 admit the sparse representations /3$ = and [}■$ = \Fc®, with j3 = /3$ + [}■$ and 
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c = 


. More succinctly, 


c 


T 

<E> 


T 


P 


$ 'll 


C<j> 

C\£ 


To recover c, and therefore also j3 and the components /3$ and By , from the observations y, we propose 
using a solution to the Dantzig selector model (see m) with an overcomplete dictionary 


c G min {||c||i : \\D l B T X T (XBc-y)\\oo < 5} , (2) 

eg C 2p 

where the diagonal matrix D G R 9 * 9 with entries djj = diag{||(XS)j|| 2 } normalizes the sensing-dictionary 
pair. Although Model © is expressed using an overcomplete dictionary with two representation systems, 
one could generalize the model to accomodate more systems. 

If the elements of X are independent and identically distributed random variables from a Gaussian or 
Bernoulli distribution, and B contains elements of fixed, nonrandom bases, then D is invertible. To see this, 
note that djj = 0 if and only if ((A' T )j, Bj) = 0 for all i G {1,2,..., n}. However, since a random sensing 
matrix is largely incoherent with, yet not orthogonal to any fixed basis m na [28], it follows that djj ^ 0 
for each j, ensuring D is invertible. Employing a sensing matrix whose entries are i.i.d. random variables 
sampled from a Gaussian or Bernoulli distribution, paired with an overcomplete dictionary formed by several 
bases or parseval frames has the added benefit of giving small restricted isometry constants, which in turn 
improves the probability of successful recovery of the coefficient vector via i\ minimization. More on these 
concepts, now standard in compressive sensing literature, can be found in 


3 A proximity operator based algorithm 

To compute the Dantzig selector, we characterize a solution of Model © using the fixed point of a system 
of equations involving applications of the proximity operator to the £\ norm. In this section we describe the 
system of equations and their relationship to the solution of Model @ and present an algorithm with an 
iterative approach for finding these solutions. 

Let A = D~ 1 B t X t XB , and define the vector 7 = D~ l B T X T y and the set T = {c : ||c — 7 II 00 < 5}. 
The indicator function ijr : C 2p —> {0, 00 } is defined by 

! 0, if u G T 
+ 00 , if u J- 
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and the proximity operator of a lower semicontinuous convex function / with parameter A 0 is defined by 

prox A/ (a;) = argmin i -U|u - x\\\ + f(u) 
uGC 2 p l 

Then Model © can be expressed in terms of the indicator function as 


c G min {||c||i + if{Ac)} . 

ce c 2 p 


(3) 


If c is a solution to Model (|3|). then for any a, A > 0 there exists a vector r G C 2p such that 


c = prox.i n.i^ — — AJt^J and r = (/ — prox tjr ) ( Ac+ r). 


Furthermore, given a and A, if c and r satisfying the above equations exist, then c is a solution to (|3jl. 
and therefore also to 0. Using the fixed-point characterization above, the (k + l) th iteration of the prox¬ 
imity operator algorithm to find the solution of the Dantzig selector model incorporating an overcomplete 
dictionary is 

c k+1 = proxi. |M|i ( c k - ±A T (2r k - r*" 1 )) , 

< “ ( 4 ) 


T k+1 = (/ - prox t ^) ( Ac k+1 + r k ) . 


If A/a < 1/||A|||, the sequence {(c fc ,r fc )} converges. The proof follows those in [T3J[52]. We remark that the 
proximity operators appearing in Equation 0 can be efficiently computed. More precisely, for any positive 
number A and any vector mGC/ 


P rox A||.|J 1 ( u ) = 


-I T 


prox AM (ui) prox AM (u 2 ) 


prox AM (u 2p ) 


and 

Prox t:F (u) 

where for 1 < i < 2p 


Pr ° X Hl —-ul <*}( Ul ) 


P r 0 X t{l ._ 72 |< 5 } (« 2 ) 


prox Hi -7 d i<n ( U2 p) 


prox A |.|(ztj) = max{|itj| - A,0} 


Uj 

\Ui\ 


and 

P r °* t{ l- T4 l< i} (“i) = 7i+max{|u i -7 i |,a} |"*_^| 

Summarizing the above, one has the following proximity operator based algorithm (POA) for approxi¬ 
mating a solution to Model 0. 
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Algorithm 1 (POA) 

Initial Parameters: The observations y are known, and the sensing matrix X and dictionary B are used 
to define the matrix A, the vector 7 and the set T. A parameter a > 0 is chosen. 

Initial Step: Initial guess r° = r _1 = 0, c° = 0, /3° = 0; A = 0.999a/|| A\\ 2 . 

Main Iterations: Generate the sequence {( c k ,r k ) : k £ N} via the iterative scheme 0 until a stopping 
criterion is met. 

Post-processing: Use the appropriate post-processing scheme to construct the Dantzig estimator c from 
the final output of the main iterative step, and obtain an approximate separation of the signal components 
from c and B. 


The main iteration process will ideally terminate once the sequence {(c fc , r k )} reaches a stationary point. 
In practice, we estimate this by stopping once either of the following are met: 

| D~ 1 B t X t (XBc k -y) 


1 . 


< e, for some 0 < e < 1 , or 


max {||c fe || 2 , 1 } 

2. The support of c k is stationary for y iterations, for some predetermined positive integer y. 


As noted in [TO], the Dantzig selector tends to slightly underestimate solution values within the support 
of the true solution. Let () denote the final product of the Main Iterations step in POA, and define 
the set A = supp{c°°}. Denote by c\ the vector whose elements are chosen from c with indices in A, and by 
B\ the submatrix whose columns are chosen from B with indices in A. The Dantzig selector is estimated 
on A by solving the least squares problem c\ = argmin{||X T (XB\c — y) || 2 } and setting <7 = 0 for i A. 

The main contribution to the computational complexity of POA is the ‘Main Iterations’ procedure. 
Assume the matrices A and A T are computed at the beginning of the algorithm, then recalled for use in 
Equation ff]). For each iteration of the ‘Main Iterations’ stage, computing (r fc , c k ) via Equation ([4]) requires 
0(q 2 ) multiplications, and determining whether the stopping criteria have been met contributes an additional 
0(q 2 ), where q is the length of the coefficient vector c. Therefore, if R iterations are required to complete 
POA, then the overall complexity of the algorithm is 0{Rq 2 ). 


4 Numerical Experiments 

In this section, the separation of noisy undersampled composite signals using POA to solve Model (0 is 
demonstrated. All codes are implemented in MATLAB R2013b on a workstation with an Intel i7-3630QM 
CPU (2.40GHz) and 16GB RAM. 

Four numerical experiments will be presented. In the first three experiments, composite signals are 
simulated, then POA is applied to the noisy incomplete observations y = XBc + z to recover and separate 
the components. The entries of the nxp sensing matrix X are sampled from the standard normal distribution, 
which is then normalized so that each column has unit £ 2 norm. The noise vector z has i.i.d. entries sampled 
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from the normal distribution N(0,a 2 ), with a = 0.01 and 0.05 corresponding to 1% and 5% noise levels 
respectively. To measure the effeciency and accuracy of the algorithm, we use the CPU run time and the 
relative (2 error of the recovery of the separate components. For each experiment, we report the means and 
standard deviations of these measurements over 50 simulations for each set of parameters. 

In the fourth experiment, real-world data taken from the United States Postal Service handwritten digits 
data sets, are used to sample the composite signal and to train the overcomplete dictionary. Given an 
undersampled signal y = X/3, where X is an n x p sensing matrix with entries taken from a Bernoulli 
distribution and /3 is the composite image, POA is used to identify the digits and to approximate the 
individual components in the composite image. 

To demonstrate the utility of POA, we also use the Alternating Direction Method (ADM) developed in 
mmm to estimate a solution to Model (O in Experiment 1. A comparison of the results are summarized 
in Figure Q] and Table [I] POA and the ADM separated the components of a composite signal with similar 
accuracy, however POA was significantly faster. We do not compare POA and the ADM in Experiments 
2 and 3 because these problem domains involve complex-valued dictionaries and coefficients, but the ADM 
was designed to solve real-valued convex optimization problems only. 

The effectiveness of the postprocessing scheme is illustrated in Figure [2] Every other figure and table 
present only results with postprocessing. In the following, time is measured in seconds, and the symbol' 
over a parameter indicates its recovered value via POA with postprocessing. For notational convenience, the 
relative t 2 error of the recovery of a vector x is denoted by E{x) := ||x — x|| 2 /||a;|| 2 - 

4.1 Experiment 1 

In this experiment, we recover and separate a composite signal that has a sparse representation in terms of 
wavelets and discrete cosine transforms. Let /3 represent the composite signal, /3$ the wavelet component and 

the sinusoidal component, each of length p. The signals are simulated by generating a sparse coefficient 
vector c of length 2 p by selecting a support set of size s uniformly at random and sampling c on the support 
set from the distribution N(100, 15). The signals are observed via y = XBc + z : where the overcomplete 
dictionary B is formed by concatenating matrices for the level 5 Haar wavelet decomposition and the discrete 
cosine transform. The simulations are performed 50 times for each m = 1, 2,..., 10 and each a = 0.01, 0.05, 
with parameters p = 256 m + 512, n = p/4, s = [n/9]. The parameters for the stopping criteria of POA are 
e = 10 -4 and ?y = 20. 

Comparisons of the mean and standard deviation of CPU time and relative l 2 recovery error of each of 
the components for 50 simulations at each level of m and a using both POA and the ADM to solve Model ([2]) 


7 


are illustrated in Figure Q] and Table [L] 


□ 




(a) CPU runtime, with a = 0.01. (b) CPU runtime, with a = 0.05. 

Figure 1: Comparisons of CPU runtime of POA and ADM from Experiment 1. The mean CPU 
runtime of the 50 simulations of each algorithm is represented by the points along the curves, 
and the vertical lines represent one standard deviation from the means. The solid line with * 
markers plot the values obtained using POA, and the dotted line with o markers plot the values 
obtained using the ADM. The horizontal axis represents the parameter m determining the size 
of the system, and the vertical axis is measured in seconds. 


4.2 Experiment 2 

In this experiment, we recover and separate a composite signal, with one component /3$ being a dirac spike 
signal with random sparse locations and values, and the other component (3^ being a sinusoidal signal with 
random sparse Fourier coefficients. The parameters used in this experiment are p = 256m, n = 64m, and 
s = \n/ 9] for m = 1,2,..., 10, and stopping criteria parameters e = 10 -4 and p = 6 for a = 0.01 and ?y = 30 
for cr = 0.05. 

For each of the 50 simulations for each value of m and cr, the experiment is set up by generating a vector 
c of length 2 p by selecting a support set S of size s uniformly at random, then sampling c on S with i.i.d. 
entries C(j) = A(j)(l + |a(j)|), where A(j) is 1 or —1 and a(j) ~ IV(0,1). The signal is observed via the 
vector y = XBc + z, with the dictionary B being a concatenation of the identity matrix and the discrete 
Fourier transform matrix, both of size p x p. 

Figure [2] illustrates the accuracy of the numerical separation by comparing the numerically recovered 
composite signal and separated components (denoted by ‘o’) against the exact values of the signal and 
components (denoted by “+’) for one simulation with m = 2 and a = 0.05. Table [2] lists the mean and 
standard deviation of the algorithm run time and the relative li recovery error of each signal component 
over the 50 simulations for each m and a. □ 

















m 

E{0*) 

POA ADM 

E (M 

POA ADM 



Mean 

Std Dev 

Mean 

Std Dev 

Mean 

Std Dev 

Mean 

Std Dev 



(xlO -3 ) 

(xl0~ 3 ) 

(xlO -3 ) 

(xl0~ 3 ) 

(xlO -3 ) 

(xlO -3 ) 

(xlO -3 ) 

(xl0~ 3 ) 


1 

9.0421 

2.2306 

8.6562 

2.0236 

9.1171 

2.9345 

8.7136 

2.7528 

1—1 

0 

2 

8.3235 

2.2061 

8.1769 

2.2004 

8.3946 

2.0634 

8.3330 

2.2880 

0 

3 

8.3875 

2.2645 

8.3397 

2.2484 

8.6408 

2.4491 

8.5115 

2.3044 

11 

4 

7.7607 

1.3936 

7.7985 

1.3568 

7.6132 

1.5551 

7.6046 

1.5981 


5 

8.2638 

1.4670 

8.2121 

1.6065 

7.5479 

1.3118 

7.6928 

1.4544 

> 

CD 

6 

8.0279 

1.4589 

8.0491 

1.5101 

7.7504 

1.4392 

7.7910 

1.6199 


7 

7.7411 

1.2514 

7.6378 

1.1422 

7.9663 

1.4901 

7.9039 

1.4730 

'0 

8 

8.2142 

1.3920 

8.1150 

1.3064 

7.9014 

1.3367 

7.8035 

1.3457 

£ 

9 

7.8209 

1.0384 

7.8440 

1.1215 

8.1155 

1.4212 

8.0859 

1.4590 


10 

7.9987 

1.2901 

7.9806 

1.2391 

7.9123 

1.2342 

7.9213 

1.1699 



Mean 

Std Dev 

Mean 

Std Dev 

Mean 

Std Dev 

Mean 

Std Dev 



(xlO -2 ) 

(xl0“ 3 ) 

(xlO -2 ) 

(xl0~ 3 ) 

(xlO -2 ) 

(xlO -3 ) 

(xlO -2 ) 

(xl0~ 3 ) 


1 

4.4187 

14.921 

4.2647 

13.274 

4.2368 

13.321 

4.1104 

12.031 

10 

O 

2 

3.9471 

10.951 

3.8098 

10.353 

3.9301 

10.570 

3.9203 

10.816 

O 

3 

3.9268 

9.6222 

3.8574 

9.0583 

3.9068 

7.8635 

3.8550 

7.4182 

11 

4 

3.9723 

7.8517 

3.9302 

7.3932 

4.0431 

12.237 

3.9847 

11.224 


5 

4.0825 

8.4696 

4.0847 

8.1707 

3.8121 

6.9432 

3.8310 

6.7285 

> 

0) 

6 

3.9041 

4.9101 

3.8963 

5.0425 

3.8486 

7.1261 

3.7879 

7.3865 


7 

4.0000 

5.8845 

3.9813 

5.4931 

4.1028 

8.0082 

4.1050 

7.6549 

m 

’0 

8 

3.8323 

6.2478 

3.8310 

6.3490 

3.9243 

7.4910 

3.9327 

7.7148 

£ 

9 

3.8040 

6.4910 

3.8195 

6.3716 

3.9178 

5.7576 

3.9010 

5.6531 


10 

3.9514 

6.2999 

3.9518 

6.2383 

3.9898 

6.3100 

3.9677 

6.3825 


Table 1: Comparisons of the relative 1 2 errors of the recovered components from Experiment 1 
using POA and ADM. The means and standard deviations over 50 simulations of the relative 
errors are given for the recovery of each component in the original signal, and for each parameter 
m and noise level a. 
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m 

Time (Seconds) 







Mean 

Std Dev 

Mean 

Std Dev 



Mean 

Std Dev 

(xlO -3 ) 

(xlO -4 ) 

(xl0~ 3 ) 

(xlO -4 ) 


1 

2.8118xl0~ 2 

5.8329xl0" 3 

6.4762 

15.851 

6.7300 

12.374 

1—1 

0 

2 

6.2733xl0 -2 

2.1378xl0~ 2 

6.4622 

8.6297 

6.7490 

8.6886 

0 

3 

0.1351 

7.5995xl0" 2 

6.6026 

8.7737 

6.4949 

7.8729 

11 

4 

0.4025 

0.2948 

6.7021 

8.2956 

6.6030 

6.3652 


5 

0.9603 

0.7242 

6.7423 

6.6547 

6.7797 

6.6109 

> 

a? 

6 

1.5431 

1.3161 

6.6408 

5.1202 

6.6585 

4.4381 

h-} 

7 

3.0107 

2.2266 

6.8240 

5.9514 

6.8955 

6.7195 

.22 

0 

8 

3.5928 

2.6642 

6.8223 

6.2282 

6.6457 

4.2048 

z 

9 

5.2908 

3.7904 

6.6667 

5.3515 

6.5987 

4.0810 


10 

8.6188 

4.7235 

6.8529 

5.7726 

6.8113 

4.5567 





Mean 

Std Dev 

Mean 

Std Dev 



Mean 

Std Dev 

(xlO -2 ) 

(xlO -3 ) 

(xl0~ 2 ) 

(xlO -3 ) 


1 

0.1076 

3.3335xl0 -2 

3.4973 

6.6099 

3.3185 

6.6436 

LO 

O 

2 

0.2615 

6.4574xl0~ 2 

3.5430 

4.2359 

3.4794 

4.3713 

O 

3 

0.6093 

0.1218 

3.4250 

4.4838 

3.3384 

3.4251 

II 

4 

1.7032 

0.3350 

3.3804 

3.2731 

3.3572 

3.2625 


5 

3.6084 

0.5986 

3.4346 

2.8969 

3.3914 

2.6619 

> 

o> 

6 

5.5694 

0.9674 

3.4677 

2.8924 

3.3116 

2.6373 

h-} 

7 

7.5435 

1.1237 

3.4493 

2.7535 

3.3879 

1.9975 

*o 

8 

9.9614 

1.1852 

3.4638 

2.3516 

3.4088 

2.2992 

z 

9 

12.8380 

1.7008 

3.4874 

2.3352 

3.3968 

2.5498 


10 

15.4924 

1.4241 

3.4326 

2.3453 

3.4374 

1.4001 


Table 2: The means and standard deviations of the CPU run time and the relative £2 recovery 
error of each component for 50 simulations of Experiment 2 using POA for each parameter m 
and noise level er. 
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(a) Recovery of /?# without postprocessing. 



(b) Recovery of /3$ with postprocessing. 




100 200 300 400 500 


100 200 300 400 500 


(c) Recovery of 91c(/3^). (d) Recovery of 3m(/3^). 

Figure 2: The separation of components from a typical simulation of Experiment 2 with m = 
2, a = 0.05. The plots in the top row are of the recovered values of the component /3$ without 
and with postprocessing. The component /3<f is complex-valued, so the recovery of its real and 
imaginary parts with postprocessing are displayed separately in the second row. In each plot, 
the exact values are denoted by “+’ and the values recovered by POA are denoted by ‘o’. 
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a = 

Mean 

0.01 

Std Dev 

a = 

Mean 

0.05 

Std Dev 

Time 

2.9154 

5.7204xl0 -2 

2.8984 

8.6182xl0 -2 

E(J3) 

2.3460xl0" 3 

2.0277xl0 -3 

5.3163xl0 -3 

3.2198xl0 -3 

E(J3*) 

2.0391xl0" 3 

1.5152xl0 -3 

3.7928xl0 -3 

2.3483xl0“ 3 

E(fa) 

3.8823xl0" 2 

3.3788xl0 -2 

8.4469xl0" 2 

5.9878xl0 -2 


Table 3: The means and standard deviations of the run time of POA and the relative £2 recovery 
norm of the composite signal and each component over 50 simulations of Experiment 3 for noise 
levels a = 0.01 and 0.05. 


4.3 Experiment 3 

In this experiment, a specified composite signal is separated into distinct components. The composite signal 
/3 is formed by combining the discretized sinusoidal component 0$, and the sparse spike component where 


( 2.7TX \ / 7 XX \ 

- \ + sin ^ —J , for x e {0,1,..., 1023}, 


and /3-q, is formed by selecting a set S of size s and sampling on S using the same method applied to 
c in Experiment 2. The overcomplete dictionary B is the concatenation of the identity and the discrete 
Fourier transform matrices, each of size p x p. Note that in this experiment, the exact coefficient vector 
is not directly specified, so the signal is observed according to y = X/3 + z. The parameters used are 
p = 1024, n = 512, s = 57, with stopping criteria e = 10 -6 and p = 6 for a = 0.01 and p = 30 for a = 0.05. 

A typical plot of the composite signal, the individual components, their recoveries and pointwise recovery 
error with a = 0.01 are shown in Figure [3] Table [3] displays the means and standard deviations of the run 
time of POA and relative £2 recovery error of the signal and each component over 50 simulations for each 
noise level. □ 


4.4 Experiment 4 

In this experiment, composite signals of two handwritten digits are classified and separated using POA and 
standard principal component analysis techniques. The data are taken from the USPS handwritten digit 
data sets, obtained from [I]. The data set contains 10 classes, labeled zero through nine, of 1100 examples of 
8-bit grayscale 16 x 16 images. Each image is reshaped as 256-column vectors. In each class, 998 examples are 
used as the training set and the remaining 102 examples form the testing set. Denote by R[j] the collection 
of vectors forming the training set, and by T[j] the collection of vectors forming the testing set. By an abuse 
of notation, also denote by R[j] the matrix whose columns come from the j th collection of training vectors. 
The composite vector /3 is sampled from the test data by randomly selecting two vectors from two distinct 
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(a) The exact signal /?. (b) Recovery error 0 — 0. 




(c) The recovered component D\c(0$). (d) The recovered component 0-q,. 

Figure 3: The separation of components from a typical simulation of Experiment 3 with noise 
level a = 0.01. Subplot (a) is the exact composite signal 0, which is then undersampled via 
y = X/3 + z. Subplot (b) is the pointwise error 0 — 0, where 0 is the recovered signal. Subplots 
(c) and (d) are the individual recovered components 91c(/3$) and (3^ separated using POA and 
Model ©. 


13 







































test sets, fa £ Tj 1 and fa £ Tj 2 , and taking /3 = fa + fa. Consider the observation y = Xfa where X is a 
random Bernoulli 128 x 256 sensing matrix. The overcomplete dictionary is learned from the labeled training 
sets by concatenating the first few principal components of each matrix R.[j\. Using Model ([2]) and POA, 
we classify the two digits in the composition fa and recover an approximation of the two digits using the 
procedure outlined in Algorithm [2] 


Algorithm 2 Composite Handwritten Digit Classification and Separation 

(1) : Input the observation y, the training sets i?[0],. .., -R[9], and a positive integer k < 256. 

(2) : For each j, compute U[j], a matrix whose columns are the first k principal components of the matrix 
R[j]. The overcomplete dictionary is learned from the training data via 

b = [u[ o] m ••• m]. 

(3) : Apply POA to y to produce a sparse coefficient vector c satisfying Model (0 and to recover the 
composite vector $ = Be. 

(4) : Classify the components of /3 as the indices ji, j 2 yielding the smallest two values of 

\(h 56 -U[j]U[j}*)p\\ 2 . 

(5) : Form a reduced dictionary 

B=[U\h] Ufa}]. 

(6) : Apply POA again to y using B to obtain a new coefficient vector c = [c^ cj 2 ] T satisfying Model 0. 
The two unknown components of /3 are approximated as 

Ph = Ufa]c h , and fa 2 = Ufa\c h . 


Let us explain the motivation behind Algorithm [2] Since each unknown digit in the composite vector 
should be described well by its corresponding principal vectors, it follows that one can approximate the 
composite vector by /3 = Be for a sparse coefficient vector c, and Model 0 is appropriate to use on y. 
In Step (2), the hrst k principal components of each training set R[j] is computed and used to form the 
overcomplete dictionary. One possible method to find each U[j] is to use the singular value decomposition. 
If UYiV* is a singular value decomposition of the R[j], then U[j] can be taken as the first k columns of the 
collection of left singular vectors U. In Step (4), the recovered composite vector $ is projected onto the 
vector spaces spanned by the first k principal components of each training class, and the components are 
classified according to the residual vectors giving the smallest fa norm. In Steps (5) and (6), the dictionary is 
reduced based on the most significant principal components identified in Step (4). Reducing the dictionary 
results in a cleaner separation since only coefficients in the identified classes will be recovered. 

Algorithm [2] was performed 1000 times for different observations y with k = 30, and the recovered classi¬ 
fication indices were compared with the true classification of the two components, as well as the classification 
determined using the smallest residual error from the composite image fa In 95.4% of the simulations, the 
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classification accuracy of $ generated by POA given only the random projections y matched or exceeded the 
classification accuracy of the exact image /3. Examples of the separation of two composite images sampled 
from the testing data sets using Algorithm [2] is illustrated in Figure 21 □ 



Figure 4: Two illustrations of the approximation of individual components composite images 
using POA in Algorithm [2] with an overcomplete dictionary learned from training examples as 
in Experiment 4. In each collection, reading left to right, the top row is the exact composite 
image /3 followed by the exact components. The bottom row is the recovered composite image 
j3 followed by the recovered components $j 1 and /3 J2 . 


5 Conclusion 

The strength of Model © and POA in separating noisy undersampled composite signals is demonstrated 
through the numerical experiments in the previous section. Figures [21 [3] and 0] and Tables [U [21 and [3] clearly 
illustrate components of a composite signal are separated using POA with a high degree of precision. In 
particular, Figure [21 demonstrates the method’s ability to distinguish mixed signals that have similar dynamic 
range, Figure [3] demonstrates the effectiveness of this method to separate a smooth signal from unwanted 
impulse noise and Figure 0] demonstrates POA can be used to separate overlaid images using an overcomplete 
dictionary training from labeled data. 

The relative £2 norm error of the recovery of each of the components remains fairly stables as m increases 
for each < 7 , as shown in Tables [T] and 0] This suggests that the method is not greatly affected by the system 
size, and is stable and reliable in practice. 

Comparisons to ADM demonstrate the advantages of using POA to separate a noisy undersampled com¬ 
posite signal and estimate solutions of Model © . When the underlying coefficient vector and overcomplete 
dictionary have only real-valued elements, the two methods have similar accuracy yet POA is significantly 
faster (Figure [I] Table 0]). Moreover, POA can be readily applied to models involving complex-valued over¬ 
complete dictionaries and coefficients, but the ADM used for comparison was designed for convex optmization 
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in real-valued domains only. In this sense, POA applies to a wider class of problems. 

In summary, we have introduced a model for the Dantzig selector incorporating overcomplete dictionaries 
that can be used to separate composite signals. Additionally, we have proposed POA, an iterative algorithm 
to estimate solutions to Model (121) and have given the results of several numerical experiments to support the 
strength of both the model and the algorithm. We have shown through the numerical experiments that POA 
is preferable over the competing method ADM, since POA produces results with similar accuracy but with 
less CPU time and is applicable to a wider class of problems. Some possible future applications based on 
the foundation of the separation of composite signals include medical imaging, image inpainting and feature 
extraction. Moreover, advances in finding sparse descriptions of noisy composite signals can be applied to 
improve technologies in signal compression and communications. 
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